[LLVM][IR] Add location tracking to LLVM IR parser #155797

Bertik23 · 2025-08-28T09:12:47Z

This PR is part of the LLVM IR LSP server project (RFC)

To be able to make a LSP server, it's crucial to have location information about the LLVM objects (Functions, BasicBlocks and Instructions).

This PR adds:

Position tracking to the Lexer
A new AsmParserContext class, to hold the new position info
Tests to check if the location is correct

The AsmParserContext can be passed as an optional parameter into the parser. Which populates it and it can be then used by other tools, such as the LSP server.

The AsmParserContext idea was borrowed from MLIR. As we didn't want to store data no one else uses inside the objects themselves. But the implementation is different, this class holds several maps of Functions, BasicBlocks and Instructions, to map them to their location.

And some utility methods were added to get the positions of the processed tokens.

Co-authored-by: Nikita Popov <[email protected]>

github-actions · 2025-08-28T09:16:06Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvmbot · 2025-08-28T11:42:35Z

@llvm/pr-subscribers-llvm-ir

Author: Bertik23 (Bertik23)

Changes

This PR is part of the LLVM IR LSP server project (RFC)

To be able to make a LSP server, it's crutial to have location information about the LLVM objects (Functions, BasicBlocks and Instructions).

This PR adds:

Position tracking to the Lexer
A new AsmParserContext class, to hold the new position info
Fixes some OOB issues in the Lexer
Tests to check if the location is correct

The AsmParserContext can be passed as an optional parameter into the parser. Which populates it and it can be then used by other tools, such as the LSP server.

The AsmParserContext idea was borrowed from MLIR. As we didn't want to store data no one else uses inside the objects themselves. But the implementation is different, this class holds several maps of Functions, BasicBlocks and Instructions, to map them to their location.

The Lexer had all of it's manual pointer increments changed to getNextChar() which checks for linebreaks and updates the line and column counters accordingly.
And some utility methods were added to get the positions of the processed tokens.

Patch is 39.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155797.diff

13 Files Affected:

(added) llvm/include/llvm/AsmParser/AsmParserContext.h (+53)
(modified) llvm/include/llvm/AsmParser/LLLexer.h (+35-3)
(modified) llvm/include/llvm/AsmParser/LLParser.h (+7-2)
(modified) llvm/include/llvm/AsmParser/Parser.h (+9-7)
(modified) llvm/include/llvm/IR/Value.h (+32)
(modified) llvm/include/llvm/IRReader/IRReader.h (+9-8)
(added) llvm/lib/AsmParser/AsmParserContext.cpp (+91)
(modified) llvm/lib/AsmParser/CMakeLists.txt (+1)
(modified) llvm/lib/AsmParser/LLLexer.cpp (+108-61)
(modified) llvm/lib/AsmParser/LLParser.cpp (+24-3)
(modified) llvm/lib/AsmParser/Parser.cpp (+19-12)
(modified) llvm/lib/IRReader/IRReader.cpp (+9-4)
(modified) llvm/unittests/AsmParser/AsmParserTest.cpp (+60)

diff --git a/llvm/include/llvm/AsmParser/AsmParserContext.h b/llvm/include/llvm/AsmParser/AsmParserContext.h
new file mode 100644
index 0000000000000..78ea32ac7ca08
--- /dev/null
+++ b/llvm/include/llvm/AsmParser/AsmParserContext.h
@@ -0,0 +1,53 @@
+//===-- AsmParserContext.h --------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_ASMPARSER_ASMPARSER_STATE_H
+#define LLVM_ASMPARSER_ASMPARSER_STATE_H
+
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/IR/Value.h"
+#include <optional>
+
+namespace llvm {
+
+/// Registry of file location information for LLVM IR constructs
+///
+/// This class provides access to the file location information
+/// for various LLVM IR constructs. Currently, it supports Function,
+/// BasicBlock and Instruction locations.
+///
+/// When available, it can answer queries about what is at a given
+/// file location, as well as where in a file a given IR construct
+/// is.
+///
+/// This information is optionally emitted by the LLParser while
+/// it reads LLVM textual IR.
+class AsmParserContext {
+public:
+  std::optional<FileLocRange> getFunctionLocation(const Function *) const;
+  std::optional<FileLocRange> getBlockLocation(const BasicBlock *) const;
+  std::optional<FileLocRange> getInstructionLocation(const Instruction *) const;
+  std::optional<Function *> getFunctionAtLocation(const FileLocRange &) const;
+  std::optional<Function *> getFunctionAtLocation(const FileLoc &) const;
+  std::optional<BasicBlock *> getBlockAtLocation(const FileLocRange &) const;
+  std::optional<BasicBlock *> getBlockAtLocation(const FileLoc &) const;
+  std::optional<Instruction *>
+  getInstructionAtLocation(const FileLocRange &) const;
+  std::optional<Instruction *> getInstructionAtLocation(const FileLoc &) const;
+  bool addFunctionLocation(Function *, const FileLocRange &);
+  bool addBlockLocation(BasicBlock *, const FileLocRange &);
+  bool addInstructionLocation(Instruction *, const FileLocRange &);
+
+private:
+  DenseMap<Function *, FileLocRange> Functions;
+  DenseMap<BasicBlock *, FileLocRange> Blocks;
+  DenseMap<Instruction *, FileLocRange> Instructions;
+};
+} // namespace llvm
+
+#endif
diff --git a/llvm/include/llvm/AsmParser/LLLexer.h b/llvm/include/llvm/AsmParser/LLLexer.h
index 501a7aefccd7f..5008ef029f3ff 100644
--- a/llvm/include/llvm/AsmParser/LLLexer.h
+++ b/llvm/include/llvm/AsmParser/LLLexer.h
@@ -29,6 +29,20 @@ namespace llvm {
     const char *CurPtr;
     StringRef CurBuf;
 
+    // The line number at `CurPtr-1`, zero-indexed
+    unsigned CurLineNum = 0;
+    // The column number at `CurPtr-1`, zero-indexed
+    unsigned CurColNum = -1;
+    // The line number of the start of the current token, zero-indexed
+    unsigned CurTokLineNum = 0;
+    // The column number of the start of the current token, zero-indexed
+    unsigned CurTokColNum = 0;
+    // The line number of the end of the current token, zero-indexed
+    unsigned PrevTokEndLineNum = -1;
+    // The column number of the end (exclusive) of the current token,
+    // zero-indexed
+    unsigned PrevTokEndColNum = -1;
+
     enum class ErrorPriority {
       None,   // No error message present.
       Parser, // Errors issued by parser.
@@ -62,9 +76,7 @@ namespace llvm {
     explicit LLLexer(StringRef StartBuf, SourceMgr &SM, SMDiagnostic &,
                      LLVMContext &C);
 
-    lltok::Kind Lex() {
-      return CurKind = LexToken();
-    }
+    lltok::Kind Lex() { return CurKind = LexToken(); }
 
     typedef SMLoc LocTy;
     LocTy getLoc() const { return SMLoc::getFromPointer(TokStart); }
@@ -79,6 +91,21 @@ namespace llvm {
       IgnoreColonInIdentifiers = val;
     }
 
+    // Get the current line number, zero-indexed
+    unsigned getLineNum() { return CurLineNum; }
+    // Get the current column number, zero-indexed
+    unsigned getColNum() { return CurColNum; }
+    // Get the line number of the start of the current token, zero-indexed
+    unsigned getTokLineNum() { return CurTokLineNum; }
+    // Get the column number of the start of the current token, zero-indexed
+    unsigned getTokColNum() { return CurTokColNum; }
+    // Get the line number of the end of the previous token, zero-indexed,
+    // exclusive
+    unsigned getPrevTokEndLineNum() { return PrevTokEndLineNum; }
+    // Get the column number of the end of the previous token, zero-indexed,
+    // exclusive
+    unsigned getPrevTokEndColNum() { return PrevTokEndColNum; }
+
     // This returns true as a convenience for the parser functions that return
     // true on error.
     bool ParseError(LocTy ErrorLoc, const Twine &Msg) {
@@ -93,7 +120,12 @@ namespace llvm {
   private:
     lltok::Kind LexToken();
 
+    // Return closest pointer after `Ptr` that is an end of a label.
+    // Returns nullptr if `Ptr` doesn't point into a label.
+    const char *getLabelTail(const char *Ptr);
     int getNextChar();
+    const char *skipNChars(unsigned N);
+    void advancePositionTo(const char *Ptr);
     void SkipLineComment();
     bool SkipCComment();
     lltok::Kind ReadString(lltok::Kind kind);
diff --git a/llvm/include/llvm/AsmParser/LLParser.h b/llvm/include/llvm/AsmParser/LLParser.h
index c01de4a289a69..02460e5e52203 100644
--- a/llvm/include/llvm/AsmParser/LLParser.h
+++ b/llvm/include/llvm/AsmParser/LLParser.h
@@ -13,6 +13,7 @@
 #ifndef LLVM_ASMPARSER_LLPARSER_H
 #define LLVM_ASMPARSER_LLPARSER_H
 
+#include "AsmParserContext.h"
 #include "LLLexer.h"
 #include "llvm/ADT/StringMap.h"
 #include "llvm/AsmParser/NumberedValues.h"
@@ -177,6 +178,9 @@ namespace llvm {
     // Map of module ID to path.
     std::map<unsigned, StringRef> ModuleIdMap;
 
+    /// Keeps track of source locations for Values, BasicBlocks, and Functions
+    AsmParserContext *ParserContext;
+
     /// Only the llvm-as tool may set this to false to bypass
     /// UpgradeDebuginfo so it can generate broken bitcode.
     bool UpgradeDebugInfo;
@@ -189,10 +193,11 @@ namespace llvm {
   public:
     LLParser(StringRef F, SourceMgr &SM, SMDiagnostic &Err, Module *M,
              ModuleSummaryIndex *Index, LLVMContext &Context,
-             SlotMapping *Slots = nullptr)
+             SlotMapping *Slots = nullptr,
+             AsmParserContext *ParserContext = nullptr)
         : Context(Context), OPLex(F, SM, Err, Context),
           Lex(F, SM, Err, Context), M(M), Index(Index), Slots(Slots),
-          BlockAddressPFS(nullptr) {}
+          BlockAddressPFS(nullptr), ParserContext(ParserContext) {}
     bool Run(
         bool UpgradeDebugInfo,
         DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
diff --git a/llvm/include/llvm/AsmParser/Parser.h b/llvm/include/llvm/AsmParser/Parser.h
index c900b79665404..22b0881d92b53 100644
--- a/llvm/include/llvm/AsmParser/Parser.h
+++ b/llvm/include/llvm/AsmParser/Parser.h
@@ -15,6 +15,7 @@
 
 #include "llvm/ADT/STLFunctionalExtras.h"
 #include "llvm/ADT/StringRef.h"
+#include "llvm/AsmParser/AsmParserContext.h"
 #include "llvm/Support/Compiler.h"
 #include <memory>
 #include <optional>
@@ -62,7 +63,8 @@ parseAssemblyFile(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
 ///              parsing.
 LLVM_ABI std::unique_ptr<Module>
 parseAssemblyString(StringRef AsmString, SMDiagnostic &Err,
-                    LLVMContext &Context, SlotMapping *Slots = nullptr);
+                    LLVMContext &Context, SlotMapping *Slots = nullptr,
+                    AsmParserContext *ParserContext = nullptr);
 
 /// Holds the Module and ModuleSummaryIndex returned by the interfaces
 /// that parse both.
@@ -128,9 +130,9 @@ parseSummaryIndexAssemblyString(StringRef AsmString, SMDiagnostic &Err);
 LLVM_ABI std::unique_ptr<Module> parseAssembly(
     MemoryBufferRef F, SMDiagnostic &Err, LLVMContext &Context,
     SlotMapping *Slots = nullptr,
-    DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
-      return std::nullopt;
-    });
+    DataLayoutCallbackTy DataLayoutCallback =
+        [](StringRef, StringRef) { return std::nullopt; },
+    AsmParserContext *ParserContext = nullptr);
 
 /// Parse LLVM Assembly including the summary index from a MemoryBuffer.
 ///
@@ -169,9 +171,9 @@ parseSummaryIndexAssembly(MemoryBufferRef F, SMDiagnostic &Err);
 LLVM_ABI bool parseAssemblyInto(
     MemoryBufferRef F, Module *M, ModuleSummaryIndex *Index, SMDiagnostic &Err,
     SlotMapping *Slots = nullptr,
-    DataLayoutCallbackTy DataLayoutCallback = [](StringRef, StringRef) {
-      return std::nullopt;
-    });
+    DataLayoutCallbackTy DataLayoutCallback =
+        [](StringRef, StringRef) { return std::nullopt; },
+    AsmParserContext *ParserContext = nullptr);
 
 /// Parse a type and a constant value in the given string.
 ///
diff --git a/llvm/include/llvm/IR/Value.h b/llvm/include/llvm/IR/Value.h
index 04d0391c04098..9e27797d151e2 100644
--- a/llvm/include/llvm/IR/Value.h
+++ b/llvm/include/llvm/IR/Value.h
@@ -55,6 +55,38 @@ class User;
 
 using ValueName = StringMapEntry<Value *>;
 
+struct FileLoc {
+  unsigned Line;
+  unsigned Col;
+
+  bool operator<=(const FileLoc &RHS) const {
+    return Line < RHS.Line || (Line == RHS.Line && Col <= RHS.Col);
+  }
+
+  bool operator<(const FileLoc &RHS) const {
+    return Line < RHS.Line || (Line == RHS.Line && Col < RHS.Col);
+  }
+
+  FileLoc(unsigned L, unsigned C) : Line(L), Col(C) {}
+};
+
+struct FileLocRange {
+  FileLoc Start;
+  FileLoc End;
+
+  FileLocRange() : Start(0, 0), End(0, 0) {}
+
+  FileLocRange(FileLoc S, FileLoc E) : Start(S), End(E) {
+    assert(Start <= End);
+  }
+
+  bool contains(FileLoc L) const { return Start <= L && L <= End; }
+
+  bool contains(FileLocRange LR) const {
+    return contains(LR.Start) && contains(LR.End);
+  }
+};
+
 //===----------------------------------------------------------------------===//
 //                                 Value Class
 //===----------------------------------------------------------------------===//
diff --git a/llvm/include/llvm/IRReader/IRReader.h b/llvm/include/llvm/IRReader/IRReader.h
index 790140f19934e..00cf12d342ae0 100644
--- a/llvm/include/llvm/IRReader/IRReader.h
+++ b/llvm/include/llvm/IRReader/IRReader.h
@@ -15,6 +15,7 @@
 #define LLVM_IRREADER_IRREADER_H
 
 #include "llvm/ADT/StringRef.h"
+#include "llvm/AsmParser/AsmParserContext.h"
 #include "llvm/Bitcode/BitcodeReader.h"
 #include "llvm/Support/Compiler.h"
 #include <memory>
@@ -50,19 +51,19 @@ getLazyIRFileModule(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
 /// for it.  Otherwise, attempt to parse it as LLVM Assembly and return
 /// a Module for it.
 /// \param DataLayoutCallback Override datalayout in the llvm assembly.
-LLVM_ABI std::unique_ptr<Module> parseIR(MemoryBufferRef Buffer,
-                                         SMDiagnostic &Err,
-                                         LLVMContext &Context,
-                                         ParserCallbacks Callbacks = {});
+LLVM_ABI std::unique_ptr<Module>
+parseIR(MemoryBufferRef Buffer, SMDiagnostic &Err, LLVMContext &Context,
+        ParserCallbacks Callbacks = {},
+        AsmParserContext *ParserContext = nullptr);
 
 /// If the given file holds a bitcode image, return a Module for it.
 /// Otherwise, attempt to parse it as LLVM Assembly and return a Module
 /// for it.
 /// \param DataLayoutCallback Override datalayout in the llvm assembly.
-LLVM_ABI std::unique_ptr<Module> parseIRFile(StringRef Filename,
-                                             SMDiagnostic &Err,
-                                             LLVMContext &Context,
-                                             ParserCallbacks Callbacks = {});
+LLVM_ABI std::unique_ptr<Module>
+parseIRFile(StringRef Filename, SMDiagnostic &Err, LLVMContext &Context,
+            ParserCallbacks Callbacks = {},
+            AsmParserContext *ParserContext = nullptr);
 }
 
 #endif
diff --git a/llvm/lib/AsmParser/AsmParserContext.cpp b/llvm/lib/AsmParser/AsmParserContext.cpp
new file mode 100644
index 0000000000000..f5e3d83f5d346
--- /dev/null
+++ b/llvm/lib/AsmParser/AsmParserContext.cpp
@@ -0,0 +1,91 @@
+//===-- AsmParserContext.cpp ------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/AsmParser/AsmParserContext.h"
+
+namespace llvm {
+
+std::optional<FileLocRange>
+AsmParserContext::getFunctionLocation(const Function *F) const {
+  if (!Functions.contains(F))
+    return std::nullopt;
+  return Functions.at(F);
+}
+
+std::optional<FileLocRange>
+AsmParserContext::getBlockLocation(const BasicBlock *BB) const {
+  if (!Blocks.contains(BB))
+    return std::nullopt;
+  return Blocks.at(BB);
+}
+
+std::optional<FileLocRange>
+AsmParserContext::getInstructionLocation(const Instruction *I) const {
+  if (!Instructions.contains(I))
+    return std::nullopt;
+  return Instructions.at(I);
+}
+
+std::optional<Function *>
+AsmParserContext::getFunctionAtLocation(const FileLocRange &Query) const {
+  for (auto &[F, Loc] : Functions) {
+    if (Loc.contains(Query))
+      return F;
+  }
+  return std::nullopt;
+}
+
+std::optional<Function *>
+AsmParserContext::getFunctionAtLocation(const FileLoc &Query) const {
+  return getFunctionAtLocation(FileLocRange(Query, Query));
+}
+
+std::optional<BasicBlock *>
+AsmParserContext::getBlockAtLocation(const FileLocRange &Query) const {
+  for (auto &[BB, Loc] : Blocks) {
+    if (Loc.contains(Query))
+      return BB;
+  }
+  return std::nullopt;
+}
+
+std::optional<BasicBlock *>
+AsmParserContext::getBlockAtLocation(const FileLoc &Query) const {
+  return getBlockAtLocation(FileLocRange(Query, Query));
+}
+
+std::optional<Instruction *>
+AsmParserContext::getInstructionAtLocation(const FileLocRange &Query) const {
+  for (auto &[I, Loc] : Instructions) {
+    if (Loc.contains(Query))
+      return I;
+  }
+  return std::nullopt;
+}
+
+std::optional<Instruction *>
+AsmParserContext::getInstructionAtLocation(const FileLoc &Query) const {
+  return getInstructionAtLocation(FileLocRange(Query, Query));
+}
+
+bool AsmParserContext::addFunctionLocation(Function *F,
+                                           const FileLocRange &Loc) {
+  return Functions.insert({F, Loc}).second;
+}
+
+bool AsmParserContext::addBlockLocation(BasicBlock *BB,
+                                        const FileLocRange &Loc) {
+  return Blocks.insert({BB, Loc}).second;
+}
+
+bool AsmParserContext::addInstructionLocation(Instruction *I,
+                                              const FileLocRange &Loc) {
+  return Instructions.insert({I, Loc}).second;
+}
+
+} // namespace llvm
diff --git a/llvm/lib/AsmParser/CMakeLists.txt b/llvm/lib/AsmParser/CMakeLists.txt
index 20d0c50a029ca..dcfcc06f093a7 100644
--- a/llvm/lib/AsmParser/CMakeLists.txt
+++ b/llvm/lib/AsmParser/CMakeLists.txt
@@ -1,5 +1,6 @@
 # AsmParser
 add_llvm_component_library(LLVMAsmParser
+  AsmParserContext.cpp
   LLLexer.cpp
   LLParser.cpp
   Parser.cpp
diff --git a/llvm/lib/AsmParser/LLLexer.cpp b/llvm/lib/AsmParser/LLLexer.cpp
index 3d5bd6155536e..be5b2b9bce0ca 100644
--- a/llvm/lib/AsmParser/LLLexer.cpp
+++ b/llvm/lib/AsmParser/LLLexer.cpp
@@ -155,15 +155,6 @@ static bool isLabelChar(char C) {
          C == '.' || C == '_';
 }
 
-/// isLabelTail - Return true if this pointer points to a valid end of a label.
-static const char *isLabelTail(const char *CurPtr) {
-  while (true) {
-    if (CurPtr[0] == ':') return CurPtr+1;
-    if (!isLabelChar(CurPtr[0])) return nullptr;
-    ++CurPtr;
-  }
-}
-
 //===----------------------------------------------------------------------===//
 // Lexer definition.
 //===----------------------------------------------------------------------===//
@@ -174,27 +165,75 @@ LLLexer::LLLexer(StringRef StartBuf, SourceMgr &SM, SMDiagnostic &Err,
   CurPtr = CurBuf.begin();
 }
 
+const char *LLLexer::getLabelTail(const char *Ptr) {
+  while (Ptr != CurBuf.end()) {
+    if (Ptr[0] == ':')
+      return Ptr + 1;
+    if (!isLabelChar(Ptr[0]))
+      return nullptr;
+    ++Ptr;
+  }
+  return nullptr;
+}
+
 int LLLexer::getNextChar() {
-  char CurChar = *CurPtr++;
-  switch (CurChar) {
-  default: return (unsigned char)CurChar;
-  case 0:
-    // A nul character in the stream is either the end of the current buffer or
-    // a random nul in the file.  Disambiguate that here.
-    if (CurPtr-1 != CurBuf.end())
-      return 0;  // Just whitespace.
-
-    // Otherwise, return end of file.
-    --CurPtr;  // Another call to lex will return EOF again.
+  if (CurPtr == CurBuf.end())
     return EOF;
+  // Increment line number if this is the first character after a newline
+  if (CurPtr > CurBuf.begin() && *(CurPtr - 1) == '\n') {
+    CurLineNum++;
+    CurColNum = 0;
+  } else
+    CurColNum++;
+  return *CurPtr++;
+}
+
+const char *LLLexer::skipNChars(unsigned N) {
+  while (N--)
+    getNextChar();
+  return CurPtr;
+}
+
+void LLLexer::advancePositionTo(const char *Ptr) {
+  bool RecalculateColumn = false;
+  while (CurPtr != Ptr) {
+    if (CurPtr > Ptr) {
+      --CurPtr;
+      --CurColNum;
+      // Since CurPtr is one char ahead of the stored position, check if the
+      // previous char is not a newline
+      if (CurPtr != CurBuf.begin() && *(CurPtr - 1) == '\n') {
+        --CurLineNum;
+        RecalculateColumn = true;
+      }
+    } else
+      getNextChar();
+  }
+  if (RecalculateColumn) {
+    CurColNum = 0;
+    // Count the number of chars to the previous newline or start of buffer
+    for (const char *Ptr = CurPtr; Ptr != CurBuf.begin() && *(Ptr - 1) != '\n';
+         --Ptr, ++CurColNum)
+      ;
   }
 }
 
 lltok::Kind LLLexer::LexToken() {
+  // Set token end to next location, since the end is
+  // exclusive
+  if (CurPtr != CurBuf.begin() && *(CurPtr - 1) == '\n') {
+    PrevTokEndLineNum = CurLineNum + 1;
+    PrevTokEndColNum = 0;
+  } else {
+    PrevTokEndLineNum = CurLineNum;
+    PrevTokEndColNum = CurColNum + 1;
+  }
   while (true) {
     TokStart = CurPtr;
-
     int CurChar = getNextChar();
+    CurTokColNum = CurColNum;
+    CurTokLineNum = CurLineNum;
+
     switch (CurChar) {
     default:
       // Handle letters: [a-zA-Z_]
@@ -215,13 +254,13 @@ lltok::Kind LLLexer::LexToken() {
     case '%': return LexPercent();
     case '"': return LexQuote();
     case '.':
-      if (const char *Ptr = isLabelTail(CurPtr)) {
-        CurPtr = Ptr;
+      if (const char *Ptr = getLabelTail(CurPtr)) {
+        advancePositionTo(Ptr);
         StrVal.assign(TokStart, CurPtr-1);
         return lltok::LabelStr;
       }
       if (CurPtr[0] == '.' && CurPtr[1] == '.') {
-        CurPtr += 2;
+        skipNChars(2);
         return lltok::dotdotdot;
       }
       return lltok::Error;
@@ -298,15 +337,15 @@ lltok::Kind LLLexer::LexAt() {
 }
 
 lltok::Kind LLLexer::LexDollar() {
-  if (const char *Ptr = isLabelTail(TokStart)) {
-    CurPtr = Ptr;
+  if (const char *Ptr = getLabelTail(TokStart)) {
+    advancePositionTo(Ptr);
     StrVal.assign(TokStart, CurPtr - 1);
     return lltok::LabelStr;
   }
 
   // Handle DollarStringConstant: $\"[^\"]*\"
   if (CurPtr[0] == '"') {
-    ++CurPtr;
+    getNextChar();
 
     while (true) {
       int CurChar = getNextChar();
@@ -358,11 +397,11 @@ bool LLLexer::ReadVarName() {
   if (isalpha(static_cast<unsigned char>(CurPtr[0])) ||
       CurPtr[0] == '-' || CurPtr[0] == '$' ||
       CurPtr[0] == '.' || CurPtr[0] == '_') {
-    ++CurPtr;
+    getNextChar();
     while (isalnum(static_cast<unsigned char>(CurPtr[0])) ||
            CurPtr[0] == '-' || CurPtr[0] == '$' ||
            CurPtr[0] == '.' || CurPtr[0] == '_')
-      ++CurPtr;
+      getNextChar();
 
     StrVal.assign(NameStart, CurPtr);
     return true;
@@ -376,7 +415,8 @@ lltok::Kind LLLexer::LexUIntID(lltok::Kind Token) {
   if (!isdigit(static_cast<unsigned char>(CurPtr[0])))
     return lltok::Error;
 
-  for (++CurPtr; isdigit(static_cast<unsigned char>(CurPtr[0])); ++CurPtr)
+  for (getNextChar(); isdigit(static_cast<u...
[truncated]

llvm/include/llvm/AsmParser/AsmParserContext.h

llvm/lib/AsmParser/LLLexer.cpp

mshockwave · 2025-09-03T18:31:17Z

llvm/include/llvm/AsmParser/AsmParserContext.h

+  std::optional<Function *> getFunctionAtLocation(const FileLocRange &) const;
+  std::optional<Function *> getFunctionAtLocation(const FileLoc &) const;
+  std::optional<BasicBlock *> getBlockAtLocation(const FileLocRange &) const;
+  std::optional<BasicBlock *> getBlockAtLocation(const FileLoc &) const;
+  std::optional<Instruction *>
+  getInstructionAtLocation(const FileLocRange &) const;
+  std::optional<Instruction *> getInstructionAtLocation(const FileLoc &) const;


could we use nullptr here to indicate absence?

mshockwave · 2025-09-03T18:31:50Z

llvm/lib/AsmParser/AsmParserContext.cpp

+  if (!Blocks.contains(BB))
+    return std::nullopt;
+  return Blocks.at(BB);


nit: perhaps we can use DenseMap::lookup_or here

Thanks for that suggestion, makes the code a bit cleaner.

But it sadly is not possible. Because the return type of lookup_or is the value type of the map, so it can't return a nullopt

mshockwave · 2025-09-03T18:33:44Z

llvm/unittests/AsmParser/AsmParserTest.cpp

+      dbgs() << #Loc1 " location: " << Loc1.Start.Line << ":"                  \
+             << Loc1.Start.Col << " - " << Loc1.End.Line << ":"                \
+             << Loc1.End.Col << "\n";                                          \
+      dbgs() << #Loc2 " location: " << Loc2.Start.Line << ":"                  \
+             << Loc2.Start.Col << " - " << Loc2.End.Line << ":"                \
+             << Loc2.End.Col << "\n";                                          \


please wrap these line with LLVM_DEBUG. Though it's a bit unusual to have debug prints in a unit test

gtest has it's own support for printing additional debugging information if an expectation fails. In the simplest case you can write something like ASSERT_TRUE(AreLocsEqual) << .... As this test is not hot, you shouldn't need the more sophisticated methods.

Also EXPECT seems more appropriate than ASSERT here.

Bertik23 · 2025-09-11T17:49:50Z

Gentle ping @mshockwave could you please have a look at the changes I did

mshockwave

Generally looks good to me. Could you tag other potential reviewers (and I'll help you to add them as I figured you probably don't have the permissions to do that yet) to give another pass?

Bertik23 · 2025-09-12T09:06:34Z

I have not really an idea, most code this PR changes is from the initial lexer from Chris
Maybe @jurahul might be interested

mshockwave · 2025-09-12T16:37:05Z

llvm/unittests/AsmParser/AsmParserTest.cpp

 #include "llvm/Support/SourceMgr.h"
 #include "gtest/gtest.h"

+#define DEBUG_TYPE "Unittest-asm-parser-tests"


nit: it's relatively rare to capitalize DEBUG_TYPE nowadays

Suggested change

#define DEBUG_TYPE "Unittest-asm-parser-tests"

#define DEBUG_TYPE "unittest-asm-parser-tests"

arichardson · 2025-09-15T21:11:58Z

llvm/include/llvm/AsmParser/AsmParserContext.h

+///
+/// This information is optionally emitted by the LLParser while
+/// it reads LLVM textual IR.
+class AsmParserContext {


The name of this class makes it sound like something that should be owned by the LLParser. Maybe something like LLParserLocationInfo?

I think AsmParserContext is better, as it's more future proof. In the future we might want to add some more info, other than location. As for the LLParser vs AsmParser, I think AsmParser is better since it uses what is parsed other then how it's parserd.

I tend to agree with @arichardson -- I'd expect AsmParserContext to be something internally used by AsmParser. What kind of future extensions do you have in mind here?

It is supposed to be used similarly to AsmParserState in MLIR, In tools, that need to know some things from the parser, in this case we want to use it to be able to access locations or instructions, functions, ... in IR files for the purposes of navigating in the file via a LSP server (such as goto definition/references)

llvm/include/llvm/AsmParser/AsmParserContext.h

llvm/include/llvm/IR/Value.h

llvm/include/llvm/AsmParser/AsmParserContext.h

nikic · 2025-10-05T10:25:25Z

llvm/include/llvm/AsmParser/AsmParserContext.h

@@ -0,0 +1,70 @@
+//===-- AsmParserContext.h --------------------------------------*- C++ -*-===//


Omit file name and emacs marker in new files.

llvm/include/llvm/AsmParser/AsmParserContext.h

nikic · 2025-10-05T10:27:11Z

llvm/include/llvm/AsmParser/AsmParserContext.h

+///
+/// This information is optionally emitted by the LLParser while
+/// it reads LLVM textual IR.
+class AsmParserContext {


I tend to agree with @arichardson -- I'd expect AsmParserContext to be something internally used by AsmParser. What kind of future extensions do you have in mind here?

llvm/include/llvm/AsmParser/FileLoc.h

nikic · 2025-10-05T19:28:49Z

llvm/lib/AsmParser/AsmParserContext.cpp

+Function *
+AsmParserContext::getFunctionAtLocation(const FileLocRange &Query) const {
+  for (auto &[F, Loc] : Functions) {
+    if (Loc.contains(Query))


This looks very inefficient...

It's actually not that bad. When testing this even on large files the LSP was pretty responsive. So I wouldn't consider this an issue for now.

Probably still makes sense to switch this to do a binary search.

I agree. I'd suggest making this change in a follow up PR.

llvm/lib/AsmParser/LLLexer.cpp

nikic · 2025-10-05T19:33:33Z

llvm/unittests/AsmParser/AsmParserTest.cpp

+      dbgs() << #Loc1 " location: " << Loc1.Start.Line << ":"                  \
+             << Loc1.Start.Col << " - " << Loc1.End.Line << ":"                \
+             << Loc1.End.Col << "\n";                                          \
+      dbgs() << #Loc2 " location: " << Loc2.Start.Line << ":"                  \
+             << Loc2.Start.Col << " - " << Loc2.End.Line << ":"                \
+             << Loc2.End.Col << "\n";                                          \


gtest has it's own support for printing additional debugging information if an expectation fails. In the simplest case you can write something like ASSERT_TRUE(AreLocsEqual) << .... As this test is not hot, you shouldn't need the more sophisticated methods.

nikic · 2025-10-05T19:35:26Z

llvm/unittests/AsmParser/AsmParserTest.cpp

+      dbgs() << #Loc1 " location: " << Loc1.Start.Line << ":"                  \
+             << Loc1.Start.Col << " - " << Loc1.End.Line << ":"                \
+             << Loc1.End.Col << "\n";                                          \
+      dbgs() << #Loc2 " location: " << Loc2.Start.Line << ":"                  \
+             << Loc2.Start.Col << " - " << Loc2.End.Line << ":"                \
+             << Loc2.End.Col << "\n";                                          \


Also EXPECT seems more appropriate than ASSERT here.

llvm/unittests/AsmParser/AsmParserTest.cpp

Co-authored-by: Nikita Popov <[email protected]>

Bertik23 · 2025-10-08T13:51:26Z

I hope I have addressed / responded all of the issues that nikic addressed

arichardson · 2025-10-08T22:18:10Z

llvm/lib/AsmParser/AsmParserContext.cpp

+
+Instruction *
+AsmParserContext::getInstructionAtLocation(const FileLocRange &Query) const {
+  for (auto &[I, Loc] : Instructions) {


Iterating over all instructions seems rather inefficient e.g. if you're trying to get info towards the end of large file. Can't we have a list sorted by file location and do a binary search?

Maybe it doesn't matter if the lookup by function is more common. Do you have a link to code using this new class?

I'm not worried about performance for the LSP, optimizations can always be done later. Please don't interpret my review comments as negative towards this change, I am very excited to see this support land eventually.

I have a draft PR with the LSP server #161969

Bertik23 · 2025-10-22T12:48:35Z

Gentle ping @arichardson @nikic
Do you have anything to add or want to see some changes?

nikic

LGTM

llvm/include/llvm/AsmParser/LLLexer.h

nikic · 2025-10-22T13:20:38Z

llvm/lib/AsmParser/AsmParserContext.cpp

+Function *
+AsmParserContext::getFunctionAtLocation(const FileLocRange &Query) const {
+  for (auto &[F, Loc] : Functions) {
+    if (Loc.contains(Query))


Probably still makes sense to switch this to do a binary search.

nikic · 2025-10-22T13:22:01Z

llvm/include/llvm/AsmParser/FileLoc.h

+  FileLoc Start;
+  FileLoc End;
+
+  FileLocRange() : Start(0, 0), End(0, 0) {}


Where is this ctor used?

It's used in the LSP server we are working on.

llvm/lib/AsmParser/LLLexer.cpp

Bertik23 · 2025-10-22T14:09:51Z

I don't have commit access, so if you could land the changes once you think it's ready, that would be great.

nikic · 2025-10-22T14:27:26Z

Can you please update the PR description? It looks like some bits are outdated, e.g. about fixing OOB accesses in the Lexer.

Bertik23 · 2025-10-22T14:30:54Z

Removed mentions of changes dropped from this PR.

This PR is part of the LLVM IR LSP server project ([RFC](https://discourse.llvm.org/t/rfc-ir-visualization-with-vs-code-extension-using-an-lsp-server/87773)) To be able to make a LSP server, it's crucial to have location information about the LLVM objects (Functions, BasicBlocks and Instructions). This PR adds: * Position tracking to the Lexer * A new AsmParserContext class, to hold the new position info * Tests to check if the location is correct The AsmParserContext can be passed as an optional parameter into the parser. Which populates it and it can be then used by other tools, such as the LSP server. The AsmParserContext idea was borrowed from MLIR. As we didn't want to store data no one else uses inside the objects themselves. But the implementation is different, this class holds several maps of Functions, BasicBlocks and Instructions, to map them to their location. And some utility methods were added to get the positions of the processed tokens.

Bertik23 and others added 7 commits August 4, 2025 09:14

Update CurLineNum anc CurColNum in sync with movement in text

ecf591c

Remove remains from cherry pick from LSP branch

06926e9

Make isLabelTail more safe and rename it to better show what it does

1fdf13c

Remove dangling comment

2772cd8

Fix typo

b05d11a

Co-authored-by: Nikita Popov <[email protected]>

Add location tracking to IR parser

458599b

Fix some OOB posibilities

ee39ed1

Fix clang format

b0c5318

Bertik23 marked this pull request as ready for review August 28, 2025 11:41

llvmbot added the llvm:ir label Aug 28, 2025

mshockwave reviewed Aug 28, 2025

View reviewed changes

llvm/include/llvm/AsmParser/AsmParserContext.h Show resolved Hide resolved

llvm/include/llvm/AsmParser/AsmParserContext.h Outdated Show resolved Hide resolved

Move private members to top of class definition

416514e

mshockwave reviewed Aug 29, 2025

View reviewed changes

llvm/lib/AsmParser/LLLexer.cpp Outdated Show resolved Hide resolved

Bertik23 added 3 commits September 2, 2025 12:12

Use SourceMgr to resolve Line:Column position

35ca1a5

Fix zeroindexing on token positions

b3d8254

Replace Line:Column storage with Poiters and on demand conversion

23dcc6b

mshockwave reviewed Sep 3, 2025

View reviewed changes

Bertik23 added 2 commits September 4, 2025 09:11

Use nullptr as missing value

06d5265

Enclose debug prints of tests in LLVM_DEBUG

4e08921

mshockwave requested a review from nikic September 11, 2025 17:57

mshockwave reviewed Sep 11, 2025

View reviewed changes

mshockwave reviewed Sep 12, 2025

View reviewed changes

mshockwave requested review from efriedma-quic and jurahul September 12, 2025 16:39

Decapitalize DEBUG_TYPE

3da9e9d

arichardson reviewed Sep 15, 2025

View reviewed changes

nikic reviewed Oct 5, 2025

View reviewed changes

Bertik23 and others added 17 commits October 6, 2025 15:20

Fix typo in comment

f5da73c

Co-authored-by: Nikita Popov <[email protected]>

Path to llvm/AsmParser/FileLoc.h

10a2b75

Co-authored-by: Nikita Popov <[email protected]>

assert.h -> cassert

17b5753

Co-authored-by: Nikita Popov <[email protected]>

Remove filename and emacs marker

737c5e0

optimize lookup

72b89e5

FileLoc docs and fix reange

41284df

full path to includes

ff9a33d

Apply suggestion from @nikic

008ae63

Co-authored-by: Nikita Popov <[email protected]>

Typo add period

a44ef20

Co-authored-by: Nikita Popov <[email protected]>

actually fix filelocrange openness

f201d1f

remove old irrelevant comment

1de2447

Doc coments with ///

4d51839

Doc coments with ///

77385c0

Merge remote-tracking branch 'upstream/main' into parser-location-info

f58e0ad

Revert changes irrelevant in LLLexer

75e5b57

revert formating

07689fc

make clang-format happy

66ce6b6

arichardson reviewed Oct 8, 2025

View reviewed changes

nikic approved these changes Oct 22, 2025

View reviewed changes

typo fix and inline decrement

c0ac9d6

nikic merged commit 18d4ba5 into llvm:main Oct 22, 2025
9 of 10 checks passed

	#define DEBUG_TYPE "Unittest-asm-parser-tests"
	#define DEBUG_TYPE "unittest-asm-parser-tests"

		@@ -0,0 +1,70 @@
		//===-- AsmParserContext.h --------------------------------------- C++ --===//

[LLVM][IR] Add location tracking to LLVM IR parser #155797

[LLVM][IR] Add location tracking to LLVM IR parser #155797

Uh oh!

Conversation

Bertik23 commented Aug 28, 2025 • edited by nikic Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Bertik23 commented Sep 11, 2025

Uh oh!

mshockwave left a comment

Choose a reason for hiding this comment

Uh oh!

Bertik23 commented Sep 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Bertik23 commented Oct 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Bertik23 commented Aug 28, 2025 •

edited by nikic

Loading

github-actions bot commented Aug 28, 2025 •

edited

Loading